Combining kNN Imputation and Bootstrap Calibrated: Empirical Likelihood for Incomplete Data Analysis

نویسندگان

  • Yongsong Qin
  • Shichao Zhang
  • Chengqi Zhang
چکیده

The k-nearest neighbor (kNN) imputation, as one of the most important research topics in incomplete data discovery, has been developed with great successes on industrial data. However, it is difficult to obtain a mathematical valid and simple procedure to construct confidence intervals for evaluating the imputed data. This chapter studies a new estimation for missing (or incomplete) data that is a combination of the kNN imputation and bootstrap calibrated EL (Empirical Likelihood). The combination not only releases the burden of seeking a mathematical valid asymptotic theory for the kNN imputation, but also inherits the advantages of the EL method compared to the normal approximation method. Simulation results demonstrate that the bootstrap calibrated EL method performs quite well in estimating confidence intervals for the imputed data with kNN imputation method. DOI: 10.4018/978-1-61350-474-1.ch016

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods

Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...

متن کامل

Semiparametric Regression Analysis under Imputation for Missing Response Data

We develop inference tools in a semiparametric regression model with missing response data. A semiparametric regression imputation estimator, a marginal average estimator and a (marginal) propensity score weighted estimator are defined. All the estimators are proved to be asymptotically normal, with the same asymptotic variance. They achieve the semiparametric efficiency bound in the homoskedas...

متن کامل

An Ensemble approach on Missing Value Handling in Hepatitis Disease Dataset

The Major work in data pre-processing is handling Missing value imputation in Hepatitis Disease Diagnosis which is one of the primary stage in data mining. Many health datasets are typically imperfect. Just removing the cases from the original datasets can fetch added problems than elucidations. A appropriate technique for missing value imputation can assist to generate high-quality datasets fo...

متن کامل

Empirical Likelihood Approach and its Application on Survival Analysis

A number of nonparametric methods exist when studying the population and its parameters in the situation when the distribution is unknown. Some of them such as "resampling bootstrap method" are based on resampling from an initial sample. In this article empirical likelihood approach is introduced as a nonparametric method for more efficient use of auxiliary information to construct...

متن کامل

Hyperbolic Cosine Log-Logistic Distribution and Estimation of Its Parameters by Using Maximum Likelihood Bayesian and Bootstrap Methods

‎In this paper‎, ‎a new probability distribution‎, ‎based on the family of hyperbolic cosine distributions is proposed and its various statistical and reliability characteristics are investigated‎. ‎The new category of HCF distributions is obtained by combining a baseline F distribution with the hyperbolic cosine function‎. ‎Based on the base log-logistics distribution‎, ‎we introduce a new di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJDWM

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2010